Exploring complex vowels as phrase break correlates in a corpus of English speech with proPOSEL, a prosody and POS English lexicon

نویسندگان

  • Claire Brierley
  • Eric Atwell
چکیده

Real-world knowledge of syntax is seen as integral to the machine learning task of phrase break prediction but there is a deficiency of a priori knowledge of prosody in both rule-based and data-driven classifiers. Speech recognition has established that pauses affect vowel duration in preceding words. Based on the observation that complex vowels occur at rhythmic junctures in poetry, we run significance tests on a sample of transcribed, contemporary British English speech and find a statistically significant correlation between complex vowels and phrase breaks. The experiment depends on automatic text annotation via ProPOSEL, a prosody and part-of-speech English lexicon.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Complex Vowels as Boundary Correlates in a Multi-Speaker Corpus of Spontaneous English Speech

We have found empirical evidence of a correlation in English between words containing complex vowels (diphthongs and triphthongs) and ‘gold-standard’ phrase break annotations in datasets as apparently different as seventeenth-century verse and a Reith lecture transcript on economics from the late twentieth-century. Spontaneous speech in the form of BBC radio news reportage from the 1980s again ...

متن کامل

Prosody resources and symbolic prosodic features for automated phrase break prediction

It is universally recognised that humans process speech and language in chunks, each meaningful in itself. Any two renditions or assimilations of a given sentence will exhibit similarities and discrepancies in chunking, where speakers and readers use pauses and inflections to mark phrase breaks. This thesis reviews deterministic and stochastic approaches to phrase break prediction, plus dataset...

متن کامل

ProPOSEL: A Prosody and POS English Lexicon for Language Engineering

ProPOSEL is a prototype prosody and PoS (part-of-speech) English lexicon for Language Engineering, derived from the following language resources: the computer-usable dictionary CUVPlus, the CELEX-2 database, the Carnegie-Mellon Pronouncing Dictionary, and the BNC, LOB and Penn Treebank PoS-tagged corpora. The lexicon is designed for the target application of prosodic phrase break prediction but...

متن کامل

ProPOSEC: A Prosody and PoS Annotated Spoken English Corpus

We have previously reported on ProPOSEL, a purpose-built Prosody and PoS English Lexicon compatible with the Python Natural Language ToolKit. ProPOSEC is a new corpus research resource built using this lexicon, intended for distribution with the Aix-MARSEC dataset. ProPOSEC comprises multi-level parallel annotations, juxtaposing prosodic and syntactic information from different versions of the ...

متن کامل

ProPOSEL: a human-oriented prosody and PoS English lexicon for machine-learning and NLP

ProPOSEL is a prosody and PoS English lexicon, purpose-built to integrate and leverage domain knowledge from several well-established lexical resources for machine learning and NLP applications. The lexicon of 104049 separate entries is in accessible text file format, is human and machine-readable, and is intended for open source distribution with the Natural Language ToolKit. It is therefore s...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009